Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move container build infrastructure to Ansible #1009

Merged
merged 11 commits into from
Jul 18, 2024

Conversation

ikerexxe
Copy link
Collaborator

Using dockerfiles to build the containers in the CI prevented us from getting the logs when any of the build steps failed. Using Ansible will help us tackle this problem while still maintaining a certain independence from Github Actions.

Using a dockerfile to build, install and test the code can be
problematic as we can't capture the log files to check what failed in
case of failure. This PR converts the fedora dockerfile to Ansible, an
open source IT automation tool. The tool can be used on the developers
and the CI system to check whether a piece of code can be built,
installed and tested.

This is the first patch in a series, where I will convert the existing
PR workflows to use Ansible instead of dockerfiles.

Signed-off-by: Iker Pedrosa <[email protected]>
share/container-build.sh Fixed Show fixed Hide fixed
share/container-build.sh Fixed Show fixed Hide fixed
@alejandro-colomar
Copy link
Collaborator

alejandro-colomar commented May 29, 2024

These ansible(1) playbooks still build Docker images, so I guess under the hood they still do the same thing that a Dockerfile and docker(1) could do. Can't we achieve the same with just docker(1) and a Dockerfile?

That is, can you explain how is ansible(1) necessary (or better than docker(1)) to achieve this?

@ikerexxe
Copy link
Collaborator Author

Yes, they do the same, and if I could I would have stick with the docker and dockerfile architecture, but it isn't possible. At least if we want to obtain the logs without having to do some nasty things, like 1 and 2 (which in reality doesn't work because when something fails the container build never arrives to this point).

In addition, when dockerfile build fails the container is destroyed, which prevents us from accessing it to see what has happened. Which brings us to the previous point, getting the logs.

Using Ansible allows us to maintain a similar approach to what we had until now: build this project for different distributions, run the workflow locally, maintain some independence from Github actions, etc. On top of that, we'll be able to obtain all the logs, and if we are running the workflow locally and anything fails we'll be able to access the containers to debug what's happening.

@alejandro-colomar
Copy link
Collaborator

alejandro-colomar commented May 30, 2024

Yes, they do the same, and if I could I would have stick with the docker and dockerfile architecture, but it isn't possible. At least if we want to obtain the logs without having to do some nasty things, like 1 and 2 (which in reality doesn't work because when something fails the container build never arrives to this point).

This does actually work (I could see the logs). I've seen it fail once (not exactly this; it was the same trick, but in a github actions), and documented it here:
#956 (comment)

And I wrote some small Dockerfile to test this with a docker build:

$ cat Dockerfile 
FROM debian

RUN bash -c "trap 'cat </etc/os-release >&2' ERR; false;"
$ sudo docker build .
Sending build context to Docker daemon  23.55kB
Step 1/2 : FROM debian
 ---> 2a033a8c6371
Step 2/2 : RUN bash -c "trap 'cat </etc/os-release >&2' ERR; false;"
 ---> Running in 3c6f20ae7c6b
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
The command '/bin/sh -c bash -c "trap 'cat </etc/os-release >&2' ERR; false;"' returned a non-zero code: 1

Or do you have any case where it failed and it didn't work?


Edit: Now I realize you probably meant the COPY trick is the one that doesn't work). That's probably true. But stderr should be good-enough.

I want to avoid the complexity of using ansible(1) if possible, and if trap(1) works, I prefer it.

@alejandro-colomar
Copy link
Collaborator

As an alternative, we could consider that make check could directly print to stderr, instead of a log. Let the user redirect the logs if it wants to. This is very much preferred behavior, IMO. Then we wouldn't need to do trap(1) magic to print the logs on failure.

@ikerexxe
Copy link
Collaborator Author

The trap works, but it feels nasty and it forces users to use the Github scroll, which doesn't work very well.

Regarding COPY, that doesn't work and we do not only want to see the build logs but also the configuration files. This would feel like an improvement to me considering our current situation.

Finally, this would also give us the opportunity of capturing all the test logs when #835 is implemented and running.

make check
args:
chdir: /usr/local/src/shadow/
ignore_errors: true
Copy link
Collaborator

@alejandro-colomar alejandro-colomar May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this exactly mean? Why do we want to ignore errors?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default if an ansible action fails, then the ansible execution is stopped. I want to ignore the errors and continue the execution to run the last action where the logs are copied from the container to the host system. This way we can gather them for inspection.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But does it report the error later? How will we know if ansible failed, if we ignore the errors? Sorry if this is obvious; I never used Ansible before.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is reported, but the execution continues. At the end of the ansible execution there's the PLAY RECAP where we'll be able to check if anything failed. I created #1014 with an intentional failure to show how it works.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I forgot to set if: always() in the Github Action. It's fine now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It seems to work now. BTW, dumb question: how do I find and read the logs (artifacts)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to open the action that failed, then click on summary, and finally scroll down to find all the artifacts

Copy link
Collaborator

@alejandro-colomar alejandro-colomar Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ughhh, and then download a .zip and extract it to find the logs. Can we (also) have a copy on stderr? I very much prefer scrolling. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will work until we start running the new tests and have several files to review

Comment on lines 9 to 15
Usage example:

- hosts: builder
connection: podman
become: true
roles:
- role: ci_run
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this usage. Where is this YAML code supposed to be used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ansible playbook: https://github.com/shadow-maint/shadow/pull/1009/files#diff-1968d6dc5c6937169e73de5fa2e83bf0d9810afcec288aa45b6cfabb40a5bf15R12-R17

Roles are self-contained units in Ansible. They are used for grouping tasks and other resources in a known file structure.

Create `build_container` and `ci_run` roles and move the fedora target
to them.

Signed-off-by: Iker Pedrosa <[email protected]>
@ikerexxe ikerexxe force-pushed the ansible_ci branch 2 times, most recently from c8ceb95 to db3c30c Compare June 5, 2024 10:49
Distribution to run can be selected when running `ansible-playbook` by
appending `-e 'distribution=fedora'` to the command.

Signed-off-by: Iker Pedrosa <[email protected]>
@ikerexxe ikerexxe force-pushed the ansible_ci branch 3 times, most recently from 7a426fe to ef6fbcf Compare June 5, 2024 14:21
- name: Build container
run: |
docker buildx build -f ./share/containers/${{ matrix.os }}.dockerfile . --output build-out
Copy link
Collaborator

@alejandro-colomar alejandro-colomar Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, what if we would map some volume to write the log files to it with docker(1)?

RUN bash -c "trap 'cat <tests/unit/test-suite.log | tee /some/mapped/volume/test-suite.log >&2' ERR; make check;"

And use https://docs.docker.com/storage/volumes/.

Copy link
Collaborator

@alejandro-colomar alejandro-colomar Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would allow storing the artifacts without needing Ansible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't mount volumes while building a container image, and that's exactly what we are doing with dockerfile. I have tried to solve this problem in various ways and with the technology we use it is not possible 😓

Copy link
Collaborator

@alejandro-colomar alejandro-colomar Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm; understood.

How about something like this?

docker build ... | tee >(sed -n '/BEGIN MARKER/,/END MARKER/p' >/host/log/file)

(probably needs redirecting stderr too (or only))

We would only need to find some consistent markers in the log.

So, the full docker logs would go to the output, and the specific error logs that we want, which would be delimited by those delimiters, would go to a specific file in the host (so the runner).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as in #1009 (comment)

Copy link
Collaborator

@alejandro-colomar alejandro-colomar Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If each log output has a different marker, you can parse each one separately:

docker build ... \
| tee \
    >(sed -n '/BEGIN A/,/END A/p' >/host/A.log) \
    >(sed -n '/BEGIN B/,/END B/p' >/host/B.log) \
    >(sed -n '/BEGIN C/,/END C/p' >/host/C.log);

@ikerexxe
Copy link
Collaborator Author

I think this is ready for review, so I'm moving it out of draft state.

@ikerexxe ikerexxe marked this pull request as ready for review June 13, 2024 07:56
@hallyn hallyn self-assigned this Jul 18, 2024
Copy link
Member

@hallyn hallyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks.

@hallyn hallyn merged commit fffa4d3 into shadow-maint:master Jul 18, 2024
9 checks passed
@ikerexxe ikerexxe deleted the ansible_ci branch July 19, 2024 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants